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(54) Cell library method 

(57) This invention provides a method for selecting 
a clone of a cell library containing a mutation in a gene 
that is expressed in a test cell comprising: (a) providing 
cDNA obtained by reverse transcription of mRNA of the 
test cell; (b) providing a collection of cultured cells of 
said library organized into individual clones, wherein 
each clone is of a cell having a mutation in an exon in 
its genome, the mutation being in a different exon in cells 
of different clones; (c) providing an array of different sin- 
gle stranded polynucleotides, the polynucleotides being 
fragments of exons containing mutations in (b); (d) ex- 



posing the cDNA to the array under conditions permit- 
ting hybridization of polynucleotides in the array to nu- 
cleic acids; (e) detecting hybridization of cDNA to a poly- 
nucleotide on the array; and, (f) selecting a clone in the 
collection from which a hybridizing polynucleotide de- 
tected at (c) is an exon fragment. This invention also 
provides a system for testing expression of a gene in a 
test cell. Also provided is a preferred exon trap vector 
for mutating ES cells. 
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Description 

FIELD OF THE INVENTION 

[0001] This invention relates to libraries of cells in 
which the genome of members of the library are modi- 
fied by gene trapping. 

BACKGROUND 

[0002] Genome-wide mutagenesis in lower organ- 
isms (e.g. bacteria, nematodes, yeast, zebra fish and 
Drosophila) followed by screening or selection for mu- 
tants using phenotypic assays has proven to be a useful 
methodology for revealing gene function in these organ- 
isms. 

[0003] The mouse provides a very useful mammalian 
animal model for studying gene function. The mouse 
model possesses significant advantages because of its 
evolutionary relatedness to humans, similarity to hu- 
mans with respect to the development of complex tis- 
sues and organs, and because it provides opportunity 
to rapidly identify homologous genes through regions of 
genomic sestina. Large-scale mutagenesis in programs 
using mice now play a significant role in the study of 
mammalian gene function (Brown & Nolan (1998) Hu- 
man Molecular Genetics, 7:1627-1633). The mutagen 
of choice for use in large-scale mouse studies is N- 
ethyl-A/-nitrosourea (ENU) which is administered to 
male mice. 

[0004] Technological advances in culture and mainte- 
nance of embryonic stem (ES) cells has provided new 
opportunities for study of eukaryotic genomes including 
that of the mouse. Murine ES cells are derived from the 
inner cell mass of about a 3.5 day embryo or blastocyst 
and can be maintained in an undifferentiated, pluripo- 
tent state in culture. ES cells can be genetically manip- 
ulated in vitro and these cells may subsequently be in- 
troduced into an embryo by blastocyst microinjection or 
embryo aggregation techniques. Upon reintroduction in- 
to the embryo, ES cells can contribute to the formation 
of all tissues of the resulting chimeric organism. ES cell 
contribution to germ cells of the reproductive organs re- 
sults in germline transmission of mutations introduced 
into the ES cell genome. For these reasons, mutation of 
ES cells is used as another means for generating mu- 
tations in the mouse genome. For example, murine ES 
cells may be irradiated (Brown & Nolan [supra]) or mu- 
tated through the use of insertional mutagenesis such 
as transposon tagging, retroviral integration, or gene 
trap mutagenesis. 

[0005] Screening strategies in mouse mutagenesis 
programs vary according to phenotype understudy and 
according to the means by which mutations are pro- 
duced. For example, various expression based strate- 
gies are described for screening cell lines or animals de- 
rived from ES cells in which a gene trap vector has been 
used to generate a mutation (e.g. Baker, era/. (1997) 



Dev. Biol., 185:201-14; Kuwano, R. (1996) Zool. Sci., 
13:277-83; Wurst, et at. (1995) Genetics, 139:889-99; 
and PCT application published January 21 , 1 999 under 
WO 99/02719). While the above-described methodolo- 
5 gies which make use of large-scale mutagenesis are 
used for study of the murine genome, gene sequence 
based systems have also been developed and are con- 
currently used for analysis of the mouse genome. The 
latter approach is expected to be used in parallel with 
10 mutagenic approaches to provide an enlarged cata- 
logue of mouse mutations and phenotypes for gene 
function studies (Brown & Nolan [supra]). 
[0006] The current gene sequence based strategy of 
choice for the mouse makes use of the production of a 
is library of ES gene trap clones indexed by either polynu- 
cleotidefragments derived from regions flanking the site 
of gene trap integration or by DNA sequence information 
derived from such fragments. The premise behind this 
approach is that most mammalian genes will soon be 
20 characterized from sequences of "expressed sequence 
tags" (ESTs). An example of such an ES cell library is 
known as Omnibank™ and is described, for example by 
Brown & Nolan [supra], Zambrowicz, et af. (1998) Na- 
ture, 392:608-611, and in United States Patents 
25 6,136,566 and 6,207,371. Another example is de- 
scribed in Wiles, M.V. era/. (2000) Nature Genetics, 24: 
13-14. Such libraries may be generated by introducing 
an exon trap vector into ES cells and cloning separate 
cell lines representing individual trap vector integration 
30 events. The exon trap vector described by Zambrowicz, 
etai (e.g. construct VICTR 20) comprises an upstream 
mutagenic cassette containing a splice acceptor (SA) 
sequence fused to a selectable reporter gene followed 
by a polyadnylation (polyA) sequence. This portion of 
35 the vector interrupts expression of the endogenous 
gene. A downstream portion of the trap vector ensures 
that integration of the trap into an exon may be detected 
without transcription of the endogenous gene. This 
downstream portion contains a promoter functional in 
40 the EScell, linked to a reporter gene followed by a splice 
donor (SD) sequence. The promoter drives expression 
of the reporter gene together with endogenous DNA 
downstream to an endogenous polyA site. Sequence 
tags from endogenous (trapped) genes may be readily 
45 recovered using 3' RACE-PCR, which generates poly- 
nucleotides correspondingto the regions which flankthe 
site of integration of the vector. Furthermore, disruption 
of the endogenous gene by an exon trap vector permits 
one to readily generate transgenic and "knock-out" mice 
50 which are heterozygous for the mutation or are entirely 
deficient in the trapped gene function. This is accom- 
plished using the ES cell methodologies described 
above. Chimeric animals that are generated by this pro- 
cedure may be bred to provide homologous mutants. 
55 Further information regarding the construction and use 
of exon trap vectors, amplification of flanking regions, 
and generation of chimeric animals is found in WO 
99/02719. 
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SUMMARY OF THE INVENTION 

[0007] This invention results from the inventor recog- 
nizing that mutant ES cell libraries such as Omnibank™ 
are not used to their full potential because these libraries 
are addressed, searched, or otherwise accessed 
through use of known or predetermined sequence infor- 
mation or probes. As is described in United States Pat- 
ents 6,136,566 and 6,207,371 , such a library works by 
indexing representative samples of mutant ES cell 
clones to polynucleotide fragments derived from the ex- 
on of the mutant cell into which the trap vector has be- 
come integrated. Actual fragments may be stored in 
some fashion and made available for hybridization stud- 
ies against pre-designed or selected oligonucleotide 
probes, or the fragments are represented in a sequence 
database. In the case of a database, the indexing sys- 
tem of the library is addressed by searching the data- 
base for sequences similar to a pre-selected target se- 
quence. In either case, the end result is the identification 
of a fragment (or fragment sequence) which is indexed 
(associated) to a particular ES cell clone. The particular 
clone may then be made available for further study, in- 
cluding for generation of mice mutated at the site of the 
fragment in the mouse genome. 
[0008] This invention is based on the inventor also 
recognizing that a mutant ES cell library as described 
above need not be addressed, searched, or otherwise 
accessed using known sequences or pre-selected 
probes. Rather, the library may be addressed as part of 
a screening method, with the result being that indexed 
ES cell clones are identified as being relevant to the 
screen, without the user having any pre-existing knowl- 
edge or assumptions about the underlying genes in- 
volved. Regardless, the user has immediate access to 
genes that are relevant to the screen and immediate ac- 
cess to sequence information associated with the gene. 
[0009] The inventor recognizes that in order to use an 
ES cell library to its full potential in a screening method- 
ology, it is necessary that the methodology function on 
a scale commensurate with the size of the library. This 
requires that the screening assay be unlike a traditional 
phenotypic or expression screens used in analysis of 
the results of mutagenesis programs. It is possible to 
make full use of an ES cell library directly in a screening 
method by employing nucleic acid microarrays to ad- 
dress the library indexing system and to act as an inter- 
face with test samples. Nucleic acid microarrays permit 
the testing of complex nucleic acid samples for hybridi- 
zation against literally thousands of polynucleotide frag- 
ments simultaneously. 

[0010] By combining the use of nucleic acid microar- 
rays with current ES cell library methodologies, it is now 
possible to address the address indexing of such a li- 
brary by interaction with a complex nucleic acid sample. 
By addressing the library, it is meant that an association 
is made between a single hybridization event on the 
microarray and a corresponding member of the library. 



The corresponding member of the library is, or is repre- 
sentative of, a sample of the very ES cell clone in which 
the fragment on the microarray to which hybridization 
occurs is derived, and in which a mutation exists at the 
5 location of the fragment in the genome of an ES cell in 
the library. 

[0011] This invention may be used for screening sam- 
ples representative of a particular biological condition 
(such as a disease state or stage of cellular differentia- 
*0 tion) and comparison may be made to samples taken 
from cells having different biological conditions or 
states. The difference in hybridization patterns on nu- 
cleic acid microarrays as between the two biological 
conditions may be readily correlated to the members of 
*5 the ES cell library used to generate the nucleic acid 
microarray. The user then has immediate access to ES 
cell clones in which genes that are differentially affected 
are tagged by insertion mutagenesis and are also avail- 
able for sequencing or generation of knock-out organ- 
isms. 

[0012] Accordingly, this invention provides a method 
for selecting a clone of an ES cell containing a mutation 
in a gene that is expressed in a test cell comprising: 

(a) providing cDNA obtained by reverse transcrip- 
tion of mRNA of the test cell; 

(b) providing a collection of cultured ES cells organ- 
ized into individual clones, wherein each clone is of 
an ES cell having a mutation in an exon of its ge- 
nome, the mutation being in a different exon in cells 
of different clones; 

(c) providing an array of different single stranded 
polynucleotides, the polynucleotides being frag- 
ments of the exons containing mutations in (b); 

(d) exposing the cDNA to the array under conditions 
permitting hybridization of polynucleotides in the ar- 
ray to nucleic acids; 

(e) detecting hybridization of a polynucleotide on 
the array; and, 

(f) selecting a clone in the collection from which a 
hybridizing polynucleotide detected at (e) is an exon 
fragment. 

[0013] This invention also includes a method for com- 
paring gene expression between test cells, comprising: 

(a) providing at least two cDN A samples, each sam- 
ple obtained by reverse transcription of mRNA of a 
different test cell; 

(b) providing a collection of cultured ES cells organ- 
ized into individual clones, wherein each clone is of 
an ES cell having a mutation in an exon of its ge- 
nome, the mutation being in a different exon in cells 
of different clones; 

(c) providing at least one array of different single 
stranded polynucleotides, the polynucleotides be- 
ing fragments of the exons containing mutations in 
(b); 
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(d) exposing the cDNA samples to the at least one 
array under conditions permitting hybridization of 
polynucleotides in the array to nucleic acids; 

(e) detecting hybridization of polynucleotides in the 
at least one array resulting from exposure to the cD- 
NA samples; 

(f) selecting clones in the collection from which hy- 
bridizing polynucleotides detected at (e) are exon 
fragments; and, 

(g) comparing a clone orclones which comprise ex- 
on fragments that hybridize to one of the cDNA 
samples to a clone or clones which comprise exon 
fragments that hybridize to another of the cDNA 
samples. 

[0014] This invention also provides a system for test- 
ing expression of a gene in a test cell, comprising: 

(a) a collection of cultured ES cells organized into 
individual clones, wherein each clone is of an ES 
cell having a mutation in an exon of its genome, the 
mutation being in a different exon in cells of different 
clones; and, 

(b) an array comprising at least 500 different single 
stranded polynucleotides on a solid support sur- 
face, the polynucleotides being fragments of the ex- 
ons containing mutations in (a). 

[0015] This invention provides a system comprising 
the combination of a collection of cultured cells and at 
least one nucleic acid microarray comprising an array 
of polynucleotides, wherein the collection and the array 
are as described above. This combination may addition- 
ally comprise a recorded index, which is a record of the 
association of individual clones in the collection to the 
position or positions in the array that coincide with poly- 
nucleotides derived from the individual clone. This re- 
corded index may be a database stored on a computer- 
readable medium. Such a recorded index may also 
comprise information associated with the clone or the 
derived polynucleotides such as sequence information. 
The combination may additionally comprise a computer- 
readable medium which comprises instructions for exe- 
cuting a computer implemented method for searching a 
database comprising the recorded index; for providing 
a record of a pattern of hybridization on an array; or, for 
providing a statistical analysis of such a pattern. An out- 
put from such a method for statistical analysis may be 
coupled to the recorded index (e.g. through the search- 
ing method) so as to associate an analysis of a hybrid- 
ization pattern with information concerning associated 
clones. 

[0016] It will be appreciated that the methods of the 
present invention are also applicable to various cell li- 
braries as known in the art. In particular, the source cells 
used to make the cell library can be from a mammalian 
cell source. 

[0017] In this invention, "selecting" a clone or clones 



may be limited to selecting data in a database, which 
data is representative of a clone or clones of ES cells, 
or the method may include locating such a clone or 
clones in a physical collection of cells organized into 
5 clones. "Selecting" may also include physically segre- 
gating cells of a clone so located. Since many genes 
may be expressed in a test cell and many polynucle- 
otides may be present on the array, these methods may 
involve simultaneous hybridization of multiple oligonu- 
10 cleotides to cDNA, thereby permitting multiple clones to 
be "selected" in the method of this invention. "Selecting" 
a clone or clones may additionally comprise producing 
an organism from a cell present in a selected clone. The 
animal may be heterozygous or homozygous for the mu- 
15 tation in the clone. 

[0018] "Comparing clones" of this invention may in- 
clude comparing data pertaining to individual clones as 
described above, or such "comparing" may be a com- 
parison of phenotypes of cells of the clones or pheno- 
20 types of organisms derived from cells of the clones. 
[0019] In this invention, coilections of cultured cells 
comprising mutations are preferably produced using ex- 
on trapping methodologies such as those known in the 
art and exemplified below. To facilitate production of 
25 knock-out organisms from the cultured ES cells, the 
gene trapping vector should be one which interrupts ex- 
pression of the exon into which the vector integrates. To 
facilitate production of the array, the vector should be 
capable of being a primer target for PGR amplification. 
30 Preferably the trap vector will include a reporter driven 
by a promoter that is functional in the ES cells. 
[0020] The array used in this invention is preferably a 
nucleic acid microarray as is known in the art. Such 
microarrays contain a large plurality of polynucleotide 
35 spots stably associated with a solid support surface. 
Preferably, different polynucleotides used in a single ar- 
ray are not capable of cross-hybridization. However, 
multiple spots each containing the same or complemen- 
tary polynucleotides may be present. Typical polynucle- 
40 otide lengths range from about 120 to about 1000 nu- 
cleotides. Spot density may in some cases be as high 
as 1 ,000/cm 2 with the number of spots in a single array 
being at least 500, preferably at least about 1 ,000, and 
in some cases being up to the order of 30,000. Materials 
45 and methods for the production and use of such micro- 
arrays are well known. A description of the construction 
of a large-scale microarray containing unique polynu- 
cleotides corresponding to individual mouse genes is 
disclosed in United States Patent No. 6,077,673. 
50 [0021] Methods for detection of hybridization events 
in microarrays are also well known. Typically, hybridiza- 
tion is detected through use of some form of label, all or 
a component of which is typically placed on nucleic ac- 
ids in a sample to be exposed to the array. Examples of 
55 such labels are fluorescent or radioactive compounds 
that are typically joined to or incorporated into the cDNA. 
High-throughput methods and apparatus are available 
for detecting, recording, and analyzing patterns of label- 
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ling resulting from hybridization on these arrays. 
[0022] Test cells for use in this invention may be any 
cell for which some aspect of the expression of the cell's 
genome is to be determined or assessed. Ideally, the 
test cell will be of the same animal type as the ES cells 
in the cell collection (library) although the this invention 
could use test cells from the organism different than that 
from which the library is derived. 
[0023] This invention is useful for comparison of dif- 
ferences in gene expression between different test cells. 
In such an embodiment, one such test cell may be con- 
sidered a standard for comparison to one or more other 
test cells of interest. Other test cells may be represent- 
ative of different biological states or phenotypes. For ex- 
ample, test cells may be representative of different 
states of differentiation, disease, neoplastic progres- 
sion, etc. 

[0024] Methods for obtaining cDNA from the mRNA 
pool of test cells are well known, as are methods for la- 
belling such cDNA to facilitate detection of hybridization 
of such cDNA to polynucleotides on a microarray. 
[0025] ES cells used in this invention may be from any 
eukaryotic organism from which such cells may be ob- 
tained and cultured. Mammals for which ES cells may 
be obtained and cultured include rodents (e.g. mice and 
rats), pigs, and humans. However, this invention does 
not include the generation of humans from ES cells. 
[0026] The present invention may also include the fa- 
cilitation of cloning of RACE-PCR products by incorpo- 
ration of a small selectable sequence between the spe- 
cific primer sequence used for RACE-PCR and an un- 
paired splice site on the gene trap vector, such that the 
small selectable marker is incorporated within the 
RACE-PCR product which facilitates its cloning. 
[0027] Accordingly, this invention provides an exon 
trap vector which is a preferable vector for use in gen- 
erating mutant ES cells for use in this invention. The vec- 
tor comprises in a 5' to 3' direction: 

(a) an unpaired splice acceptor; 

(b) a region encoding a reporter; 

(c) one or more polyadenylation signals; 

(d) a promoter functional in an ES cell; 

(e) a segment encoding a second reporter under 
transcriptional control of promoter; and, 

(f) an unpaired splice donor, 

wherein the construct additionally comprises a selecta- 
ble region of 300 base pairs or less between (a) and (b) 
or between (e) and (f). The selectable region may en- 
code a selectable marker (such as supF) or the selecta- 
ble region is a recombination site such as att, lox, or frf. 
Preferably, the selectable region will be immediately ad- 
jacent to the sequence in the vector that has been de- 
signed or selected to be a primer target for PCR. 



DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS OF THE INVENTION 

[0028] This invention includes indexing a library of ge- 

5 netically altered cells and screening and isolating a par- 
ticular clone of interest from the library using high- 
throughput DNA microarrays. The library is used as a 
source for identifying and obtaining specifically mutated 
cells, cell lines derived from the individually mutated 

10 cells, and cells for use in the production of transgenic 
non-human animals. This methodology provides an ef- 
ficient and rapid method for the identification of novel 
genes, rapid determination of its chromosomal map po- 
sition and placement of genes on the physical map for 

*s the generation of gene transcript maps for eukaryotic 
genomes and simultaneous generation of gene knock- 
out organisms for in vivo gene function analyses of cor- 
responding genes. This approach allows the expansion 
of the scope of biological investigation from studying sin- 

20 g|e genes/proteins to studying all genes/proteins simul- 
taneously. 

[0029] The present invention encompasses an inte- 
grated functional genomics strategy that combines 
large-scale gene trap mutagenesis and tagging of gene 

25 transcripts in ES cells with the high-throughput and ver- 
satile nucleic acid microarray technology for genome- 
wide expression analysis. The method involves the use 
of DNA microarrays comprising signature DNA frag- 
ments corresponding to trapped genes in each embry- 

30 onic stem cell gene rap clone and screening for identi- 
fication of differentially regulated trapped genes. The 
microarrays are indexed to corresponding clones. 
[0030] Gene trapping may be performed in mice using 
a gene trap DNA construct comprising two functional 

35 units. The first functional segment consists of a muta- 
genic, detectable component that comprises an un- 
paired splice acceptor sequence fused to an internal ri- 
bosomal entry sequence (IRES) linked to a (e.g. £-ga- 
lactosidase) reporter gene followed by a polyadenyla- 

40 tion signal sequence (e.g. SA-IRES-pgal-pA). The sec- 
ond functional unit encodes a selectable sequence ac- 
quisition module consisting of a promoter such as 
mouse phosphoglycerate kinase-1 (PGK) that is active- 
ly transcribed in ES cells, fused to a reporter (e.g. the 

45 puromycin /V-acety transferase gene) followed by an un- 
paired synthetic consensus splice donor sequence (e. 
g. PGKpuroSD). A preferred vector comprises one or 
more small selectable sequences less than 300bp in 
length that facilitate cloning of trapped genes by 5' or 3' 

so RACE-PCR, The DNA construct may be the unpaired 
splice acceptor sequence upstream of a small selecta- 
ble sequence linked to the primertarget sequence used 
for 5' RACE-PCR. Alternatively, the DNA construct may 
have the unpaired splice donor sequence downstream 

55 of the primer target used for 3' RACE-PCR linked to a 
small selectable sequence. Such small selectable se- 
quences of less than 300 bp include bacterial selectable 
markers such as supF or site-specific recombination 
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sites such as affB, loxP or frt 

[0031] Transfection of the gene trap DNA construct 
via el ectrop oration into ES cells results in random inte- 
gration (the majority of which are single copy vector in- 
tegration events) into the ES ceil genome by illegitimate 
recombination. The selectable (e.g. PGKpuroSD) gene 
cassette lacks a polyadenylation signal sequence. 
Therefore, puromycin resistance from the exemplified 
vector can only be achieved by splicing into downstream 
exons and polyadenylation signal sequence of the 
trapped endogenous gene. The trap vector not only in- 
troduces a molecular tag that permits subsequent clon- 
ing and identification, chromosomal localization and 
placement onto the physical map of the trapped gene, 
but also simultaneously generates ES cells bearing mu- 
tations in the respective genes that facilitates generation 
of knock-out mice. 

[0032] Each ES cell trap clone obtained simultane- 
ously provides access to the following key pieces of in- 
formation: 1) partial cDNAgene fragments correspond- 
ing to the trapped genes can be cloned by rapid ampli- 
fication of cDNA ends (3 1 RACE-PCR); 2) the identity of 
the novel genes trapped can be determined by obtaining 
partial gene sequence information through high- 
throughput DNA sequencing of RACE-PCR products; 3) 
the chromosomal localization of the trapped genes can 
be identified by fluorescence in situ hybridization (FISH) 
mapping; 4) the genomic DNA sequence flanking the 
site of integration can be rapidly cloned and sequenced 
providing sequence information that will allow for rapid 
placement of genes on DNA conttgs or the physical 
map; 5) the direct histochemical demonstration of the 
pattern of gene expression (e.g. due to the presence of 
the LacZ reporter gene) in either chimeras or germline 
animals produced with ES cells can be attained; and, 6) 
in vivo gene function information can be obtained from 
phenotypic, physiologic, and biochemical analyses of 
ES cell-derived knock-out mice and cell lines. 
[0033] A partial or complete set of randomly geneti- 
cally altered cells is generated. For example, a library 
of ES cell gene trap clones is generated by random in- 
sertional mutagenesis using the above-described gene 
trap vector. Each trapped gene is cloned by 3' 
RACE-PCR. PCR products are then used for the fabri- 
cation of DNA microarrays. Quantitative gene expres- 
sion analyses using DNA microarray hybridization is 
subsequently performed in order to identify differentially 
expressed genes in a variety of model systems, Gene 
chip hybridization probes derived from test and control 
cell or tissue samples are prepared from defined biolog- 
ical systems, for example: neurodegenerative disease; 
DNA repair; prostate cancer; adhesion signalling; mac- 
rophage activation; immune tolerance and activation; 
apoptosis; dendritic cell function;, and, liver regenera- 
tion. An advantage of using the method of this invention 
prior to DNA sequencing is that sequencing may then 
be restricted to differentially regulated genes. This rep- 
resents a huge economical saving. 



[0034] Significant cross-regulation of certain gene 
classes in multiple biological systems is anticipated. For 
example, genes that are up-regulated in apoptosis, neu- 
rodegenerative disease, andT-cell anergy may be con- 
s versely down-modulated in cancer progression, liver re- 
generation, T-cell activation, etc. Examination of so 
many diverse genes gives a perspective on all the proc- 
esses that simultaneously occur within a model system. 
The comparison of gene expression profiles between 

10 model systems will provide new insight into the role of 
genes in the context of multiple processes. Therefore, 
this invention will be useful to identify gene families that 
play common and unique functional roles in multiple 
pathways and systems. 

15 [0035] PCR products from corresponding differential- 
ly regulated trapped genes identified by microarray hy- 
bridization are used as DNA templates for sequencing. 
ES gene trap are subsequently may be selected for 
chromosomal localization by FISH mapping. Flanking 

20 genomic DNA sequence are cloned and subsequently 
sequenced. Bioinformatic analyses of partial gene se- 
quence information and chromosomal localization is 
then performed. By comparison of gene sequence and 
chromosomal position with databases, information with 

25 respect to whether the trapped genes are novel or 
known, are part of a gene family, contain known func- 
tions domains, etc. is then determined. Based on the 
results of bioinformatics, specific ES cell clones are for 
generation of knock-out mice and determination of in W- 

30 vo gene expression pattern. Homozygous mutant mice 
and cell lines are then used for phenotypic, biochemical, 
and physiologic analyses. Subsequent cycles of gene 
identification may be performed using hybridization 
probes derived from mutant mice and cell lines for fur- 

35 ther rounds of microarray hybridization studies. 

Generation of ES cell gene trap clones 

[0036] In the following example, gene trap mutagen- 
ic esis is performed in J1 ES cells. The J1 ES cell line was 
chosen for the following reasons: 1) J1 cells are derived 
from a 1 29 substrain that has been chosen as the source 
of genomic DNA for the international mouse genome se- 
quencing project; and 2) J1 ES cells were originally de- 
45 rived from an inbred homozygous genotype allowing for 
easy back-crossing to generate knock-out in inbred 
background also allows for gene knock-out to be out- 
crossed onto outbred background with a minimal 
number of matings. 
so [0037] Gene trapping in ES cells is performed using 
the gene trap DNA construct described above compris- 
ing a mutagenic, detectable component (SA-IRES-|Jgal- 
pA) and a selectable sequence acquisition module (PG- 
KpuroSD). Most gene trap events containing SA-pgal- 
55 pA result in a null allele (Zambrowicz, et ai [supra], 
Skarnes, etai. (1 991 ) Genes Dev., 6(6):903-91 8). More- 
over, the expression of the reporter pgal genes is under 
the control of the endogenous promoter. The pattern of 
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LacZ activity, therefore, mimics that of the endogenous 
gene allowing for histological assessment of in vivo 
gene expression pattern. The internal ribosomal entry 
sequence (IRES) allows for reporter gene translation in- 
dependently of the reading frame of the splice junction. 
The PGKpuroSD component of the GST vector results 
in expression of the puromycin resistance gene as fu- 
sion transcripts with the 3' end containing downstream 
exons and the polyadenylation signal of tagged genes. 
This fusion transcript allows for the identification of the 
trapped genes by 3' RACE-PCR in undifferentiated ES 
cells, even if the genes are not expressed in ES cells. 
[0038] In order to facilitate cloning of the RACE-PCR 
fragments, the Gateway™ cloning system by Gibco 
BRL can be adapted to the PCR strategy of this inven- 
tion by introducing a 25bp sequence corresponding to 
the a?/B1 site just upstream of the splice donor or splice 
acceptor sequence and downstream of the gene trap 
vector specific primers used for RACE-PCR amplifica- 
tion. The affB2 site is incorporated into the adaptor prim" 
er. Cloning of the RACE-PCR fragments as described 
below may then be facilitated by use of the Gateway™ 
selection systems. Alternatively, a supF gene may be 
introduced between the gene trap vector specific primer 
and the splice acceptor or donor site such that upon 
RACE-PCR amplification the supF sequence is incor- 
porated into the PCR product, Subsequent cloning of 
RACE-PCR products can then be efficiently performed 
by selection into P3 plasm id containing E. coti such as 
MC1061/P3. P3 carries an amber AmpR and an amber 
TetR gene. 

[0039] Host cells are transformed by any of the well 
known methods, selected as being suitable for the par- 
ticular cell type. Electroporation or calcium phosphate 
mediated transfection are suitable for mammalian cells. 
A preferred method known for ES cells is electropora- 
tion. 

[0040] A library of gene trap ES ceil clones each har- 
bouring mutations in unique genes are generated using 
the gene trap DNA vector. Each ES cell clone has vari- 
able cell numbers and growth rates per well after colony 
isolation. In order to normalize the ES cell numbers per 
well after clone isolation, ES cells are trypsinized and 
split into two plates after a few days of culture. One plate 
is used to determine cell number using an MTT based 
assay that is detected using an ELISA plate reader. The 
ELISAMicroplate Autoreader E1311™ is employed. Us- 
ing Bioworks™ software, discontinuous samples are 
split by merely supplying a file containing cell number 
data in a comma-delimited format which is easily be ex- 
ported from Excel™. ES cell clones at varying concen- 
trations in the source plate are individually replated by 
the automated Biomek2000™ resulting in consolidation 
of clones having similar concentrations in the destina- 
tion 96 well plates. After a few days of culture, three rep- 
lica 96-well plates are generated using the Biomek 
2000™ workstation. Two replica plates of ES cells are 
cryo-preserved using an improved 96-weil plate freez- 



ing protocol for ES cells that allows long-term storage 
(Udy and Evans, (1994) Biotechniques, 17(5):887-94; 
lire, etai (1992) Trends in Genetics, 8(1 ):6; Chan and 
Evans, (1991) Trends in Genetics, 7(3):76). All plates 
s are barcoded with unique identifiers. 

[0041] A third replica plate of cells are used for isola- 
tion of total polyA mRNA as templates for reverse tran- 
scription (RT) and 3'RACE-PCR (3" rapid amplification 
of cDNAends-polymerase chain reaction) RNAfrom ES 
10 clones is isolated using a rapid, automated magnetic 
bead-based mRNA isolation procedure. The Dynai mR- 
NA Direct™ protocol is automated using a Beckman Bi- 
omek 2000™ robotic workstation that is adapted with a 
magnetic plate (Dynal XS-96T) placed on the work sur- 
fs face of the robotic workstation. The 96-well plates con- 
taining ES cell clones are processed automatically on 
the workstation. Automated RNA extraction is linked 
with thermocycling by integration of a PTC-200-MJ Re- 
search™ thermocycler (with a robotic lid) adjacentto the 
20 Biomek 2000™ workstation. The Bioworks™ software 
program is capable of automated control set-up and ac- 
tivation of the RT and 3' RACE-PCR reactions using uni- 
versal primers in the thermocycler. After the PCR run, 
PCR products are transferred directly for PCR purif ica- 
25 tion using the magnetic bead based procedure called 
solid-phase reversible immobilization (SPRI) of the 
Whitehead Institute for Genomics Research. 
[0042] Purified 3' RACE-PCR DNA fragments are 
then used in preparation of high density DNA microar- 
30 rays. 

Preparation of DNA microarrays 

[0043] DNA microarray technology is generally per- 
35 formed on two main types of solid substrates: glass 
microarrays containing as many as 30,000 DNA spots 
and nylon membranes containing as many as 5,000 
DNA spots. Glass slides have several advantages as 
described (Southern, etai, (1999) Nat. Gen. Suppl., 21 : 
40 5-9); 1) target DNA is coupled covalently onto treated 
glass surface; 2) glass can withstand high temperature 
and high ionic wash solutions and is non-porous so hy- 
bridization volumes can be kept to a minimum which en- 
hances the kinetics of hybridization; 3) glass has virtu- 
es ally no auto fluorescence and very low non specific 
probe binding which allows very low signals to be quan- 
titated; and, 4) two or more probes can be labelled with 
different fluorochromes and hybridized together to de- 
tect differential hybridization. 
so [0044] There are two mechanical aspects of microar- 
ray technology: array spotters (robots) and array scan- 
ners such as those described in Bowtell, (1999) Nat. 
Gen. Suppl., 21 ;25-32. DNA array spotters are available 
that have the capacity to spot up to 44,000 spots per 
ss standard slide (20 mm X 50 mm). The SDDC-1™ DNA 
arrayer by Engineering Services Inc., (Toronto, Ontario) 
is suitable for the production of DNA arrays on glass 
slides. 
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[0045] Microarrays are prepared by spotting PCR de- 
rived DNA products each representing a single gene in- 
tegration or tag event as described above. The first 
stage involves the spotting of 10,000 DNA targets onto 
a 20 mm x 20 mm area. Target DNA will be prepared 
and stored in master microtitre plates as described 
above. Positive controls to be spotted may include 15 
housekeeping genes, plasmid DNA, genomic DNA, and 
40 spots of GFP DNA. 

[0046] RACE-PCR DNA libraries in 96 well format 
may be used for printing microarrays by direct spotting 
onto glass slides. Microarraysmay be prepared by spot- 
ting PCR derived DNA products each representing a 
single gene trap event as described above. 
[0047] Printing, hybridization, scanning and analyses 
of microarrays may be performed using the Total Array 
System™ manufactured by Bio Robotics as well as the 
Virtex™ microarray scanner after co-hybridization of flu- 
orescently labelled probes. The primary expression ar- 
ray data is analyzed using principle component analy- 
ses and clustering analyses software programs using 
the commercially available software programs for image 
analyses and data extraction such as ImaGene™ by Bi- 
oDiscovery, ImageQuant™ by Molecular Dynamics and 
Atlaslmage™ by Clontech and using other publicly 
available software programs such as the Cluster Anal- 
yses™ software program from Stanford Genomics Re- 
sources and the ArrayView™ software from the NIH Na- 
tional Human Genome Research Institute. 

Preparation of cDNA test samples and exposure to 
array 

[0048] The Cy3 and Cy5 fluorescent labels have good 
incorporation efficiencies with reverse transcriptase, 
photostability and yield, and are widely separated in 
their excitation and emission spectra, allowing highly 
discriminating optical filtration. Alternatively, analyses 
may be performed using 33 P-labelled cDNA. 
[0049] Single stranded cDNA probes may be synthe- 
sized from 5 ^.g of total RNA using reverse transcriptase 
in reactions containing oligo d(T) primers, deoxynucle- 
otides and either Cy3-dUTP or Cy5-dUTP. Prior to labe- 
ling, the RNA population will be spiked with 1 \ig of GFP 
RNA produced by in vitro transcription of a plasmid 
clone with a T3/T7 RNA polymerase initiation signal. 
This internal control serves to normalize labelling effi- 
ciency between RNA preps, to confirm grid location, and 
measure uniformity of hybridization across the array. 
Following reverse transcription, RNA will be degraded 
by treatment with alkali and heat, and fluorescently la- 
belled cDNA purified using Qiagen™ DNA purification 
columns. 

[0050] Equivalent amounts of labelled cDNA probes 
will be combined and exposed to the microarray under 
a glass cover slip at 65° C for 8 hours. Slides will be 
washed under known high stringency conditions, dried, 
and scanned for fluorescence. 



[0051] The microarrays may be scanned for fluores- 
cence using the Molecular Dynamics Avalanche™ 1 
scanner with lasers specific for the fluorescence of 
these probes. 

DNA sequencing component 

[0052] The tagged genes that display changes in ex- 
pression in different disease states may be sequenced 
10 using an Applied Biosystems™ model 373XL automat- 
ed laser sequencer. 

FISH mapping for determining chromosomal 
location 

15 

[0053] Fluorescence in situ hybridization (FISH) in- 
volves labelling a cosmid, phage, plasmid or BAC/PAC 
clone with a non-isotopic tag, such as biotin or digoxi- 
genin 9 and the labelled probe is then hybridized to met- 

20 aphase spreads and the fluorescent signals are detect- 
ed at the site of hybridization to homologous sequences 
at one chromosome band location. 
[0054] A universal probe consisting of the approxi- 
mately 10 kb Gene Sequence Tag DNA vector is used 

25 for mapping experiments of all GST integration events 
in ES clones identified. The use of a universal probe al- 
lows efficient sample throughput. The probes are la- 
belled with biotin-14-dUTP or digoxigenin (DIG)- 
14-dUTP by nick translation. Duplicate slides are run for 

30 each probe. On average, it takes 2-4 FISH laboratory 
experiments to obtain signals adequate to complete the 
mapping using 4\6-Diamidin-2-phenylindol-dihydro- 
chloride (DAPI) banding. Since characteristics of univer- 
sal probe hybridization to ES cell chromosomal DNA is 

35 optimized, the determination of chromosomal localiza- 
tion by FISH is more efficient and high-throughput. The 
chromosome position may be confirmed by cytogeneti- 
cists. Images may be captured as TIFF files, converted 
to JPEG format, and subsequently analysed. 

40 

Database 

[0055] The data gathered from the DNA sequencing 
and FISH treatment of the ES cell gene tag clones may 
45 be compiled in a database. 

Gene Knock-Out Chimeric Mouse Generation 

[0056] Gene knock-out chimeric mice may be gener- 
ic ated from targeted ES cell clones. Weekly microinjec- 
tions of one to two different ES cell clones into blasto- 
cysts for the production of chimeric mice will be per- 
formed (approximately 4 and 6 chimeric mice typically 
will be produced per ES cell clone). Chimeric mice are 
55 maintained until germline transmission is achieved, then 
subjected to further breeding for generation of ho- 
mozygous mutations, and for specific phenotypic anal- 
ysis. 
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Microarray analyses example 

[0057] The Shionogi carcinoma closely mimics the re- 
sponse of human prostate cancer to androgen with- 
drawal therapy and is a good mouse tumour model sys- 5 
tern for studying progression to androgen independ- 
ence. Approximately 5 x 1 0 6 of the parent, androgen de- 
pendent Shionogi tumour cells were injected subcuta- 
neousiy into individual male mice of the DDS strain and 
the tumours were allowed to grow for about 1 7-20 days 10 
attaining a weight of ~3 g. The host animals were then 
castrated and subsequently sacrificed 1 day, 2 days, 4 
days, and after tumour recurrence. Recurrent, androgen 
independent tumours (with a mass of ~1 g) are seen 
between 20-30 days after castration. At each time point, is 
total RNA is extracted from the regressing and recurrent 
tumours and reverse-transcribed to produce fluores- 
cently labelled-cDNA for hybridization with gene trap 
microarrays as described above. Changes in gene ex- 
pression in Shionogi tumours on 290 genes correspond- 20 
ing to ES gene trap clones following castration at days 
1 , 2, and 4 post castration and in androgen independent 
tumou rs were analyzed using the Tree View™ h ierarchi- 
cal clustering software program (M. Eisen, Lawrence 
Berkely National Laboratory), which identified a cluster 25 
of genes that is highly induced following castration and 
is subsequently down-regulated with progression to an- 
drogen independence. 

[0058] Although the foregoing invention has been de- 
scribed in some detail byway of illustration and example so 
for purposes of clarity of understanding, it will be readily 
apparent to those of skill in the art in light of the teach- 
ings of this invention that changes and modification may 
be made thereto without departing from the spirit or 
scope of the appended claims. Ail patents, patent appli- 35 
cations and publications referred to herein are hereby 
incorporated by reference. 



Claims 40 

1 . A method for selecting a clone of a cell library con- 
taining a mutation in a gene that is expressed in a 
test cell comprising: 

45 

(a) providing cDNA obtained by reverse tran- 
scription of mRNA of the test cell; 

(b) providing a collection of cultured cells of 
said library organized into individual clones, 
wherein each clone is of a cell having a muta- 50 
tion in an exon in its genome, the mutation be- 
ing in a different exon in cells of different clones; 

(c) providing an array of different single strand- 
ed polynucleotides, the polynucleotides being 
fragments of exons containing mutations in (b); 55 

(d) exposing the cDNA to the array under con- 
ditions permitting hybridization of polynucle- 
otides in the array to nucleic acids; 



(e) detecting hybridization of cDNA to a poly- 
nucleotide on the array; and, 

(f) selecting a clone in the collection from which 
a hybridizing polynucleotide detected at (c) is 
an exon fragment. 

2. A method for comparing gene expression between 
test cells, comprising: 

(a) providing at least two cDNA samples, each 
sample obtained by reverse transcription of 
mRNA of a different test cell; 

(b) providing a collection of cultured cells of a 
cell library organized into individual clones, 
wherein each clone is of a cell having a muta- 
tion in an exon of its genome, the mutation be- 
ing in a different exon in cells of different clones; 

(c) providing at least one array of different sin- 
gle stranded polynucleotides, the polynucle- 
otides being fragments of exons containing mu- 
tations in (b); 

(d) exposing the cDNA samples to the at least 
one array under conditions permitting hybridi- 
zation of polynucleotides on the array to nucleic 
acids; 

(e) detecting hybridization of polynucleotides in 
the at least one array resulting from exposure 
to cDNA; 

(f) selecting clones in the collection from which 
hybridizing polynucleotides detected at (e) are 
exon fragments; and, 

(g) comparing aclone or clones which comprise 
exon fragments that hybridize to one of the cD- 
NA samples to a clone or clones which com- 
prise exon fragments that hybridize to another 
of the cDNA samples. 

3. The method of claim 1 or claim 2, wherein mutations 
in the cell library are as a result of introducing an 
exon trap vector into the source celis of said library. 

4. The method of any one of claims 1 to 3, wherein the 
source cells of said celi library are murine. 

5. The method of any one of claims 1 to 4, wherein the 
cells of said cell library are ES cells. 

6. The method of any one of claims 1 to 5, wherein the 
array is a nucleic acid microarray. 

7. The method of claim 6, wherein the microarray com- 
prises at least 500 different polynucleotides on a 
solid support surface. 

8. The method of claim 7, wherein the microarray com- 
prises at least about 1,000 different polynucle- 
otides. 
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9. The method of any one of claims 1-8, wherein the 
cDNA is labelled to facilitate detection at (e). 

10. The method of claim 9, wherein the label is fluores- 
cent or radioactive. 

1 1 . The method of any one of claims 1 -1 0, wherein se- 
lecting a clone comprises physically segregating a 
sample of cells from a selected clone. 

12. A system for testing expression of a gene in a test 
cell, comprising: 

(a) a collection of cultured cells of a cell library 
organized into individual clones, wherein each 
clone is of a cell having a mutation in an exon 
of its genome, the mutation being in a different 
exon in cells of different clones; and, 

(b) an array comprising at least 500 different 
single stranded polynucleotides on a solid sup- 
port surface, the polynucleotides being frag- 
ments of the exons containing mutations in (a). 

13. The system of claim 12, wherein the cells of said 
cell library are ES cells 

14. The system of claim 1 2 or claim 13, wherein the ar- 
ray comprises at least about 1 ,000 different polynu- 
cleotides. 

15. The system of claim 1 4, wherein the array compris- 
es at least about 10,000 different polynucleotides. 

16. The system of any one of claims 12 to 15, wherein 
the array is a nucleic acid microarray. 

17. The system of any one of claims 12 to 16, wherein 
the system additionally comprises a recorded index 
associating a position in the array at which a poly- 
nucleotide is present, to a clone comprising that 
polynucleotide in an exon in which there is a muta- 
tion. 



(f) an unpaired splice donor, 

wherein the construct additionally comprises 
a selectable region of 300 base pairs or less be- 
s tween (a) and (b) or between (e) and (f), 

21. The vector of claim 20, wherein the selectable re- 
gion encodes a selectable marker. 

10 22. The vector of claim 20, wherein the selectable re- 
gion is supF. 

23. The vector of claim 20, wherein the selectable re- 
gion is a recombination site. 

15 

24. The vector of claim 23, wherein the recombination 
site is selected from the group consisting of: att, fox, 
and frt 



18. The system of claim 17, wherein the recorded index 

is stored on a computer-readable medium. 45 

19. The system of any one of claims 13 to 18, wherein 
the ES cells are murine. 

20. An exon trap vector comprising, in a 5' to 3' direc- so 
tion: 



(a) an unpaired splice acceptor; 

(b) a region encoding a reporter; 

(c) one or more polyadenylation signals; ss 

(d) a promoter functional in an ES cell; 

(e) a segment encoding a second reporter un- 
der transcriptional control of promoter (d); and, 
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